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O . This paper estimates demand for Internet portals using a clickstream data panel of 

O: 

. 2654 users. It shows that familiar econometric methodologies used to study grocery 

r— I. store scanner data can be applied to analyze advertising-supported Internet markets 

. using clickstream data. In particular, it applies the methodology of Guadagni and 

O ■ Little (1983) to better understand households' Internet portal choices. The method- 

. ology has reasonable out of sample predictive power and can be used to simulate 

> : 

OO , changes in company strategy. (JEL classification numbers: M31, C25) 

O ■ 

O ■ 

o ■ 

, 1. Introduction 
O ; 

^ ' The growth of the Internet has provided economists, marketers, and statisticians with a 

' potentially rich and informative data source. Since everything on the Internet is necessarily 

' digital, all activity can be easily recorded and stored in a database for future examination. 
This data has found disparate uses, from advertisement targeting to law enforcement. One 
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prevalent but relatively under used example of such data is clickstream data. This data 
consists of each website visited by a panel of users and the order in which they arrive at 
the sites. It is often accompanied by the time of arrival at and departure from the site as 
well as the degree of activity at the site and the demographic characteristics of the users. 
Examples of companies that collect clickstream data based on broad panels are Netratings 
Inc., MediaMetrix Inc., and Plurimus Corp. This paper uses data from Plurimus Corp. to 
analyze user choice of Internet portals. It will show that commonly used econometric models 
for examining grocery scanner data can be applied to clickstream data on advertising-based 
online markets. 

A portal is a launching pad to the Internet. Portals, such as Yahoo, Lycos, and MSN, are 
sometimes referred to as search engines. Adar and Huberman (1999 p. 2) describe them as 
"a refinement of the web search engine service" . Portals have search engine capabilities, but 
they also have other features. These may include email, news, and a link-based directory 
to the web separate from the search service. There are few, if any, pure search engines 
remaining. In this paper, I am interested in the portal as a starting point and not as a 
destination. Therefore I look at the use of portal main pages, directory pages, and search 
pages, but not at email, news, and shopping pages. 

The methodology used here closely mimics that of Guadagni and Little's (1983) paper 
that estimates a multinomial logit model with scanner data to examine consumer coffee 
purchases. It shows that the model has reasonably good out-of-sample predictive ability. 
Furthermore, informative simulations can be conducted on the effects on market share of 
changing a variable. For example, it can derive an estimate of the impact on number of 
visits of increasing advertising by one dollar. The results, however, have to be interpreted 
with caution. The data does not satisfy the Independent of Irrelevant Alternatives as- 
sumption made in the model. This assumption implies that there is no correlation between 
the alternatives outside of the observed variables. When this assumption fails, estimating 
switching behavior becomes difficult. The coefficients and the simulation results will there- 
fore have some bias. Future work will apply the techniques of more recent developments 
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in the econometric analysis of panel data to the Internet portal market. This will alleviate 
the above problem. 

Guadagni and Little take advantage of the richness of their data set by treating each 
purchase as a separate observation. Those few studies that have used clickstream data thus 
far (such as Goldfarb (2000a), Moe and Fader (2000), Sandvig (2000)), have aggregated 
the data to a market share level. While this has provided interesting insights into specific 
problems, it is not the best approach to understanding website choice and the causes of 
website shares within a given market. Aggregating the data deletes considerable relevant 
information. Important determinants of website choice include an individual's past experi- 
ence at a site and the site that the individual went to the previous time. Unlike most other 
marketing studies using choice-specific data, there is no monetary price here. Goettler and 
Shachar (1999) also examine a consumer panel that faces no price consisting of individual 
choices of television shows. 

Developing a framework to study consumer choices of free (advertising-supported) web- 
sites is an essential step to better understanding user behavior on the Internet. According 
to the data set used in this study, more than two-thirds of all consumer Internet traffic is at 
advertising-supported sites. With the exceptions of Amazon and EBay, the top twenty sites 
in terms of unique visitors are all advertising-supported. The literature on this important 
aspect of the Internet is sparse. Three studies that focus on advertising-supported websites 
are Adar and Huberman (1999), Gandal (2001), and Goldfarb (2000a). Adar and Huber- 
man (1999) show that portals can discriminate between users as those looking for certain 
topics are willing to spend more time. This means that search engines can capture more 
consumer surplus (in the form of advertising revenue), but forcing consumers that are will- 
ing to spend more time to view more pages and advertisements. Gandal (2001) examines 
market share at an aggregate level to try to examine the portal market. He finds that early 
entrants have an advantage and that certain features matter more than others. Goldfarb 
(2000a) examines concentration levels in advertising-supported Internet markets. 

Lynch and Ariely (2000) is one of few Internet studies that looks at choice-specific data. 
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They construct a simulated environment for the purchase of wine and examine purchase 
choice. Like Lynch and Ariely's study, this paper takes advantage of the choice-specific 
data. Unhke their study, I look at the choice of free web sites using actual user clickstreams. 

The main data for this study was supplied by Plurimus Corporation. It is a clickstream 
data set consisting of every website visited by 2654 users from December 27 1999 to March 
31 2000. It also contains data on the time of arrival at and departure from each site. In 
total, the data set contains 3,228,595 website visits, of which 859,587 (2622 people) are to 
Internet portals. Using this data, I construct measures of past search success, past time 
spent searching, whether a site is an individual's starting page, whether an individual has 
an email account at the site, and the number of pages viewed at each site. A considerable 
section of this paper is dedicated to explaining the construction of these variables from the 
raw data. Many of the decisions were based on a questionnaire conducted in June 2000 
(see Goldfarb 2000b for further details). I hnk the Plurimus data to monthly advertising 
spending data from J. Walter Thompson Company and media mentions data found through 
the Lexis-Nexis Academic Universe. 

The next section of the paper will describe the apphcation of the methodology used by 
Guadagni and Little to the present problem. Section three will describe the data set, the 
questionnaire used to inform data construction, the actual process of data construction, and 
summary statistics. Section four will present the results, test the model's predictive ability, 
and examine market response to changes in the control variables. The paper will conclude 
by summarizing the key results and proposing several potential areas for future research. 

2. Using the Multinomial Logit With Clickstream Data 

Internet users choose which website to visit just as they make several other economic choices: 
given the alternatives available and the information they have about those alternatives, 
they choose the alternative that will give them the highest utihty. In terms of grocery 
products such as coffee (studied by Guadagni and Little), this means that households buy 
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the product that has the best attributes for the lowest price. In terms of portals, this means 
that households will use the portal that will allow them to maximize the probability of 
finding what they seek and minimizing the time spent. Conceptually, I assume households 
are exogenously given a "goal" when they go online. I explore this assumption in the 
questionnaire part of this paper and in Goldfarb (2000b). They go to the portal that they 
expect will help them achieve that goal in the least time with the most accuracy. 

In the multinomial logit model, the expected utility of the portal is based on past his- 
tory, several website characteristics (that may vary over time), outside influences such as 
advertising and media mentions, and an idiosyncratic error term. Formally, household i 
visits website j at choice occasion t when 

Euijt > Euikt (1) 
for all k ^ j. Here Uijt is defined by 

Euijt — XijfPijt + Eijt (2) 

Xijt may include variables that change over any or all of ^, j, and t. (3 may vary over 
i, j, or t, implying household heterogeneity, brand heterogeneity, time (choice occasion) 
heterogeneity or any combination of the three. In this study, X^jf will never vary over just 
t, just ^, just t and i, or just t and j. It will vary over just j in the form of portal-specific 
dummy variables. (3 will be assumed constant. Future work will look at heterogeneity 
across households in p. There are / households, J websites, Tj choice occasions for each 
household, and J2i=i T-i total choice occasions. 

It is expected utility to the user, not to the observer, that is of interest. It is assumed 
that the user knows £ijt. The expectation is taken over relevant variables that the user may 
not know the value of before visiting the website. For example, the user does not know how 
long she will spend on the website. She does, however, have an expectation of how long it 
will take based on her past experience at that website. 

In order to get the multinomial logit form, the Sijt are assumed to be independently 
distributed random variables with a type II extreme value distribution. Given the above 

5 



assumptions, the probability of household i choosing brand j at choice occasion t can be 
expressed as: 



The model, as expressed above is a combination of Theil's (1969) multinomial logit and 
McFadden's (1974) conditional logit. It is commonly referred to as a mixed logit or as a 
multinomial logit. Since this paper assumes /3 is fixed, the model here is a conditional logit. 
The log likelihood function is as follows: 



where dijt is equal to one if alternative j is chosen by individual i at time t, and is equal to 
zero otherwise. 

A significant potential problem with this framework is that it implies an assumption of 
independence of irrelevant alternatives (HA). If a household is offered a new alternative 
that is almost identical to one of the current alternatives, say k, then this new alternative 
should be expected to only draw buyers from k; however, under llA, the new alternative 
draws buyers from all the other alternatives. I test for and reject the assumption of IIA in 
section 4. This is a significant problem that will be addressed in future work by allowing 
for household heterogeneity. 

In this model, the researcher observes the choice by each household on each choice oc- 
casion. Let yijt = 1 if household i chooses website j on choice occasion t and let yijt — 
otherwise. The researcher also observes the characteristics of each website at that choice 
occasion for that household Xijt. 

3. Data 

A. Raw data sources and description 




(3) 



I Ti J 



(4) 



i=i t=i j=i 



The main data set consists of 3,228,595 website visits by 2654 households from December 
27, 1999 to March 31 2000. Also included in the initial data set was the time of arrival at 
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and departure from a website, the beginning and end of each onhne session, and the number 
of pages visited at that site. This data, collected by Plurimus Corporation, was used to 
construct a data set of 859,587 portal choices by 2622 households. This study uses only 2008 
of these households and keeps the others to test the model out of sample. Furthermore, it 
only looks at the eight most frequently used portals comprising eighty percent of all portal 
visits. Therefore the final data set consists of 519,705 portal choices by 2005 households. 

Plurimus has an anonymizing technology that allows them to collect information about 
users without needing the users' permission. Plurimus avoids significant privacy concerns 
because the users are anonymous and the data cannot be traced to any actual person. 
They are regularly audited by PriceWaterhouseCoopers in order to ensure they exceed the 
privacy requirements of the FCC guidelines. Unlike volunteer panel data, behavioral records 
from anonymized users are not biased by the wish to be seen in a socially desirable light. 
Moreover, there is no selection bias into the sample itself, yielding a sample from a broader 
spectrum of socioeconomic status than is typically available from panel studies. 

This data, however, has five limitations that need to be considered when extending the 
results of this study to the entire Internet. First, the geographic distribution of the sample is 
considerably biased. New York, Chicago, and Los Angeles are under-represented. Roughly 
half the sample comes from the Pittsburgh area. Another quarter is from North Carolina 
and another eighth is from Tampa. This problem is not as severe as it may first appear 
because portals national product.R 

The second limitation is that it does not collect data on America Online (AOL) users. 
Since AOL subscribers make up roughly 50% of all American home Internet users, this could 
bias the results. AOL, however, provides a different product from the other Internet service 
providers. AOL users are encouraged to stay within the gated AOL community and they 
generally do not venture out onto the rest of the Internet. Moreover, preliminary surveys 



Future research with Plurimus' data will not suffer from this limitation 
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commissioned by Plurimus show that when AOL users do leave the gated AOL community, 
they have similar habits to other web users. This data limitation will, however, put a 
downward bias on visits to the AOL portal. 

Third, the data contains information on few users at work. Online habits at work are 
likely different from those at home; however according to a study by Nie and Erbring (2000), 
64.3% of Internet users use the Internet primarily at home; just 16.8% use it primarily at 
work. Few data sets, however, contain reliable at work panel data. 

Table 1 compares unique visitors as a fraction of Yahoo's users for the eight portals used 
in this study as estimated by several companies. I chose to use a base of comparison because 
the numbers vary as a result of the assumed online population. I use unique visitors rather 
than total visits because that was the data that was available from the other companies. 
The number of unique visitors for a month to a website is the number of different households 
that go to a given website over the course of the month. Some of the variation between the 
methodologies may be a result of exactly which webpages are considered part of the main 
site. The data in the table is website-specific (not Internet property-based) meaning, for 
example, that YahooSports is not considered to be a part of Yahoo. I could not find website 
based results for Media Metrix in March or for Nielsen/ Netratings in any month. With the 
exception of AOL, Plurimus's numbers are well within the range of the other companies, 
and therefore the above issues with the data may not be important for understanding portal 
choice by users who are not AOL subscribers. 

The fourth limitation is that the data is collected at the household level rather than at 
the individual level. If two people in a given household have considerably different habits 
this will show up as one person with widely varying habits. While this makes it difficult to 
assess the extent of learning over time, it is a standard problem in consumer panels. 

Fifth, it does not contain information on households from the first time they go online. 
Therefore initial conditions are potentially a problem. Although the observations may not 
be independently and identically distributed, this problem may be partially alleviated by 
the law of large numbers due to the number of observations per household in the data set. 
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More than 79% of the households in the final data set make 30 or more choices. The mean 
household makes 259 portal choices and the median household makes 120 portal choices. 

Together, these five data limitations mean that results should be extended to different 
geographic distributions, AOL users, and at work users with caution. Furthermore, the 
fourth and fifth limitations mean that understanding learning behavior is not possible. 

I join this clickstream data set with two other data sets. The first is an advertising data 
set provided by J. Walter Thompson Company. This data set consists of all advertising 
spending by each of the portals used in this study on a monthly basis. The spending is 
determined by a thorough sampling of television, radio, newspaper, magazine, outdoor, and 
Internet advertising by each of the portals. The number of advertisements is then multi- 
plied by the average cost of advertising in each medium (at the program level in television 
and the issue level in magazines). Since this data is not individual-specific, it will likely 
underestimate the impact of advertising. The methodology used in this paper, however, 
can easily be adapted to individual-specific advertising data. 

I also constructed a data set of 'media mentions' for each of the relevant companies. If a 
company is mentioned on network television news (ABC, CBS, or NBC), in the Wall Street 
Journal, in the New York Times, or in USA Today on a given day or the day before then 
the media mentions variable is equal to one. Otherwise it is equal to zero. Unfortunately, I 
do not know which individuals were actually watching or reading which media. It is likely, 
however, that mentions in these media are highly correlated with mentions in other media 
such as local newspapers. 

In the data set several dozen portals are observed to be chosen. For computational 
feasibility, I limit the number of portals to the eight with the most visits (in order): Yahoo, 
Microsoft Network (MSN), Netscape, Excite, AOL, Altavista, Iwon, and Lycos. These eight 
make up eighty percent of all visits and all sites with more than 2.5% of total visits. There 
was a natural break after Lycos because the ninth most visited portal, MyWay, is a site that 
is the default of several Internet Service Providers and is rarely chosen as anything but a 
start-up page. Go.com is not included because, although it is commonly ranked in the top 
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five portals, a large percentage of those visits are to destination websites such as ESPN.com, 
Disney.com, and MrShowbiz.com. The Go.com portal page itself ranks tenth in total visits 
and ninth in unique visits. Qualitative results, however, do not change with the addition 
of more portals. Future work will explore methodologies that allow for the inclusion of a 
larger number of portals. 

B. Questionnaire 

There were several issues related to analyzing a chckstream data set that did not have obvious 
answers. I conducted an email-based survey of Internet search habits to help resolve these 
issues. Using surveys to inform data interpretation is relatively rare in economics. Helper 
(2000) asserts that economists should use more surveys and field research in order to better 
understand data. She emphasizes that this type of research "allows exploration of areas with 
little preexisting data or theory" (p. 228). Analysis of chckstream data certainly qualifies 
as one such area. Manski (2000) recommends questionnaires to ehcit agents' preferences 
and expectations directly. Jaffe, Trajtenberg, and Fogarty (2000) use surveys to determine 
whether patent citations are a good proxy variable for communication. In other words they 
use a survey to determine how to interpret a data set. In this paper, I use a survey to 
determine how derive variables such as search success from raw chckstream data. Further 
details on the questionnaire are in Goldfarb (2000b) 

1. Questionnaire methodology 

The survey was sent to each participant as an email attachment in Microsoft Word template 
format. In the accompanying email, I explained that I was a doctoral student in economics 
studying Internet habits. Respondents came from two groups. The first group, henceforth 
referred to as the 'spammed' group, consists of the 34 respondents to unsohcited email. The 
second group of respondents consisted of 23 'friends of friends'. After receiving a response 
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rate of roughly three percent, I decided to augment my numbers by asking several friends 
and family members to forward the survey to their mailboxes. When there is sufficient 
data, I present results in this paper for the 34 'spammed' respondents and for the 57 in the 
total sample. 

Clearly, this is a biased sample and cannot be used to conduct classical statistics. It 
can, however, be used to inform myths, suggest ideas, and suggest stories. The survey 
results are quite informative about individual surfing habits. By observing a biased sample 
of people, I can follow the search process more closely than I can with a broader sample. It 
is common practice in psychology and in experimental economics to draw candidates from 
undergraduate classes, and then to use this information to inform theory. 

The survey itself asks respondents to search for driving directions, medical information, 
an MPS, and something of their own choosing. Respondents then answered several questions 
about the searches (see the Appendix for a copy of the full questionnaire) . The search tasks 
were chosen to be diverse and to reflect common search activities. The survey also asks 
several questions about user Internet habits. 

2. Questionnaire Results 

Two of the issues addressed in the questionnaire are particularly important to this paper. 
The first is determining which variables are relevant to an analysis of search engine compe- 
tition (and hence which variables to construct and include in the study). The second is how 
to determine whether a given search fails. Other issues addressed include whether faster 
search is more desirable, and whether habits differ at the second search engine in a given 
search from those at the first. 

There are many potential relevant variables for analysis. The survey asked which pages 
individuals bookmarked and what was each individual's starting page. Individuals rarely 
bookmarked portals, and those that were bookmarked were rarely used in the actual search 
part of the questionnaire. On the other hand, start pages were found to play an important 
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role in site choice. The survey also asked respondents to give reasons why they preferred 
their favorite portal. The only specific portal feature mentioned was email. Other features 
such as shopping, Internet radio, games, and an online community were not mentioned. As 
a result of these findings, I include whether a portal is an individual's start page and whether 
that person has an email account at that site. I do not include other features or bookmarks. 

Another important variable that the survey suggests should be included is the goal of 
search. Most respondents claimed to use more than one search engine because "Different 
search engines are better suited to different tasks" . Links to relevant pages were also said 
to be important. I could derive data on whether a portal is linked to the next page visited. 
I did not include goal of search in the final analysis because including it did not satisfy the 
Akaike information criterion or the Bayesian information criterion for goodness of fit. 

Whether a search fails is an important factor for an individual's experience with a data 
set. Ideally each person would only conduct one task during each online session. Therefore if 
the researcher observes the individual visit a search engine followed by a visit to a destination 
website without searching again, then it would be reasonable to assume the search was 
successful. In this scenario, if the researcher observes the individual search again after 
going to the site then the search would appear to have been a failure. More than 45% of 
the respondents claim to either perform several tasks or have no specific task in mind when 
they go online, considerably complicating the definition of a failed search. 

The group with no specific task in mind makes up only five percent of respondents (6% 
of spammed). Defining how they search and the reasons for it are beyond the scope of this 
survey. Much more important is controlling for the more than forty percent of respondents 
(also roughly 40% of spammed) who do several tasks when they go online. One way to do 
this is to compare the goals of searches that occur during a given session. If the goals are 
the same, it is more likely that they are part of the same search task. Also, the elapsed 
time between searches may be relevant as would the number of sites seen between the visits 
to search engines. 

Thus, if people search twice for the same thing in a short period of time, it seems 
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reasonable to assume that the first search was a failure and the second a success. This 
relies on one further assumption: that people do not go to the destination site from the 
portal by typing in the name of the site. They only use links on the search page. They 
may type in a destination site, but not from the portal. Only 5.8% of 155 searches (4.7% 
of 85 for spammed) were followed by the use of a non-portal site that was not the final 
destination. This means that using the above method, over ninety-four percent of searches 
labelled as successful would in fact have been successful. While this is not perfect, it seems 
to be a reasonable measure. Also, if a person goes directly from one search engine to another 
then the visit to the first site is likely a failure. Using the above criteria, I constructed a 
variable for whether each search failed. 

The survey also showed that more experienced users search faster. This suggests that 
faster search is probably more desirable. Furthermore, the survey suggests that habits are 
different at the second search engine visited during a given search than at the first. Again, 
while the above information comes from a statistically biased sample, it does inform the 
researcher about analysis of clickstream data. 

C. Data set Construction 

I used the above information to construct several variables from the raw clickstream data. 
Table 2 shows a sample of ten lines of raw data. Using only this information, I constructed 
the following variables: email, goal of search, start page, view length at the portal, links, 
search failure, whether a portal was the first visited in the search process, and Guadagni & 
Little's weighted loyalty variable. I will describe the derivation of each in turn. 

A household was considered to have an email account at a site if the household used the 
email feature at that site more than that at any other portal. I know that a household 
used email at a given site because the 'host' in the data would reveal this. For example, 
'com.yahoo.mail' is Yahoo's email provider and 'com.hotmail' is MSN's email provider. No 
household used more than one email account a large number of times, so I did not allow 
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for households to have more than one portal as an email provider. Many households 
did not use a portal email provider. This same email variable is potentially endogenous 
when individual heterogeneity is not taken into account because users will set up an email 
account at their favorite portal. As such it can be used as a proxy for some individual 
heterogeneity. Furthermore, if the goal is to predict future choices or to simulate changes, 
then this endogeneity is not relevant. It was the initial decision to use the email that was 
endogenous, once that account is set up, then each choice of portal is based on the existence 
of the email account. 

As described in section 3.2, knowing the goal of search is important for knowing whether 
a search fails. The goal of search was determined by the category of the site following a visit 
to a portal, if that next site was visited within five minutes of the end of the portal visit. 
If the goal of the search is another portal, then the goal of the first search is considered to 
be the same as the goal of the second. If no site is visited within five minutes of the end of 
a portal visit, then the search is considered to have no known goal. 23.4% of all searches 
have no known goal. Most of these occur because many people return to a portal page 
before logging off the Internet. I do not consider these to be failed searches. The goals 
were divided into roughly one hundred overlapping categories including news, music, email, 
shopping for computers, automotive information and travel. 

A portal is considered to be a household's start page if at least 50% of all online sessions 
begin with that page. An online session is considered to end if a user does not do any activity 
for thirty minutes. While imperfect, this method determines a starting page for almost all 
of the households. Like, same email, start page is potentially endogenous. People often 
change their start page to their favorite website. Again like same email, this can proxy 
individual heterogeneity and the endogeneity is not relevant if the goal is to predict future 
choices or to simulate changes. 28% of households have their start page at a portal. This 
is likely lower than the general population due to the lack of AOL users. 

The view length spent at a portal is the time of departure minus the time of arrival (in 
seconds). Recall that it is time spent during previous visits that is important for whether 
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a household returns to that portal. 

The number of pages viewed at a portal may reflect the depth of search. While indi- 
viduals likely want to minimize time spent generally, search depth may be an important 
control factor. As with view length, it is number of pages viewed during previous visits that 
is important for whether a household returns to that portal. This study only reports results 
from a one period lag on last view length and last number of pages. More complicated 
functions of past time spent and previous number of pages viewed do not yield qualitatively 
different results. 

Links were determined by visiting each portal and recording which websites were directly 
hnked to the main page. I recorded links in early April for each of the portals. While it 
is possible that several of the links changed, there were no relevant changes in partnerships 
over that time. If the site that an individual visited following a portal visit was linked to a 
portal, the link variable takes on a value of one. Otherwise, it equals zero. Note that the 
link variable can equal one even if the household did not visit that portal. For example, 
a household could search for financial information of Yahoo, and the search may turn up 
information on MSNmoneycentral. The link variable serves as a proxy for portal features. 
Instead of listing whether a portal has features, this variable proxies whether people actually 
use these features. In other words, if people use a link, it means they are using a feature at 
that site, rather than the search capabilities. 

Search failure was constructed largely as described in section 3.2. If a household visited 
two portal sites in a row, and there was less than five minutes between visits, then the first 
search is considered a failure. Furthermore, if the household conducts a search and then 
searches again for the same goal (at the same site or at a different one) within five minutes 
of the first search then the search is considered a failure. While five minutes is an arbitrary 
number, extending it to ten minutes or shortening it to three minutes did not change the 
number of failures much. As with time spent, it is whether previous searches at a site failed 
that matters. Also as with time spent, more complicated functions of past failure do not 
yield qualitatively different results. For robustness, I also calculated a failed search variable 
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that included searches that were not followed by other searches. 

If a portal was the first visited in the search process, then firsttryijt = 1. If an individual 
has already searched and failed, then firsttryijt — 0. 

This paper mimics Guadagni and Little's methodology for constructing their 'loyalty' 
variable almost exactly. In their paper, loyalty is considered to be a weighted average of 
past purchases of the brand, treated as dummy variables. Let portsameijt = 1 if household 
i bought brand j as its previous purchase and zero otherwise. 

loyaltyijt = aloyaltyijt-i + (1 — a)portsameijt (5) 

Rather than estimate a by maximum likelihood which would significantly complicate the 
computational problem they calibrate a based on dummies for lags of length one to ten. In 
the present study, the value for alpha that minimizes the sum of the difference between the 
actual dummy coefficients and the loyalty function above was 0.7782. I also use portsame 
alone as a loyalty variable in the study. Note that this loyalty variable can be a result of 
either individual preferences for a given portal or from some kind of lock-in. In future work, 
I plan to separate out these effects of heterogeneity and state dependence. In a recent study, 
Abramson, Andrews, Currim, and Jones (2000) find this to be the best loyalty measure they 
tried. 

In this study, I define the portsameijt variable to depend on the previous portal visited 
of any kind, not just the previous of the eight portals used in this study. Therefore, if a 
household visits Yahoo then About.com and then Yahoo again, portsameijt on the second 
visit to Yahoo is equal to zero, even though only two observations are included in the data 
set. This means that a household is not considered brand loyal if it went to a rival portal's 
website, even if that rival portal is not in the sample. If I only included the sample, the 
coefficient on the loyalty variable increases slightly but its significance falls slightly. Note 
that the initial conditions problem frequently encountered in this literature does not apply 
here due to the large number of observations per household. 

How much time a household's previous visit to a portal took and whether that search 
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failed are only observed when the household has visited that portal previously in the data 
set. Since not every household visits every portal, these variables are missing for a large 
number of observations. I therefore created a dummy variable for missing data. I also 
interact one minus the missing data variable with the view length of previous search and 
the failure of previous search variables. This overcomes the significant potential bias of 
assuming a value for the missing data or of ignoring it entirely. The missing data dummy 
has no economic interpretation. 

Tables 3 and 4 contain descriptive statistics of the final data set. 

4. Results 

A. Coefficients 

Table 5 presents the main results of the paper. Model (1) presents the basic model. Here, 
the potentially endogenous variables of same email, link, and start page are not included. 
The variables all have the expected signs, although last view length is barely significant: 
loyalty, advertising, and media mentions are all correlated with a higher probability of 
search. Last view length and last search failed are all correlated with a lower proba- 
bility of search. The positive sign on last view length squared suggests that the effect of 
last view length is concave. There was no expectation on the sign of missing data. The 
coefficient on advertising likely underestimates the actual effect of advertising as the data is 
aggregated over the month rather than actual advertising viewed by the user. 

Model (2) adds same email and link with the expected results. Taking these into 
account makes last view length significant. Model (3) adds last number of pages and 
first try. Last number of pages is found to have an increasing and concave relationship 
with choice probability. This is consistent with the assumption that pages viewed proxy 
depth of search. In this regression, last view length is significant at the 99% confidence level. 
Thus, controlling for depth, households prefer to spend less time at a portal. First try 



17 



reveals that Netscape and MSN are preferred as first pages in a search than as later pages. 
This makes sense as they are the pages that appear when using the search function in the 
Netscape Navigator and Microsoft Internet Explorer browsers. They are also often default 
start pages, but the results do not change in models (4) through (6) which control for the 
start page. 

Model (4) adds the start page variable to model (2). The coefficient on this variable 
is very large compared to the other dummy variables and the likelihood improves more for 
this variable than for any others; however, the coefficient is not significantly different from 
zero as it has an extremely high standard error. 

Model (5) is the same as model (4) except that is adds the interaction variable of media 
mentions and loyalty. Of particular interest here is the increase in the significance of 
media mentions. This suggests that being mentioned in the media has a larger effect for 
households that are less loyal to the brand. 

Model (6) is the 'kitchen sink' regression in that it includes all of the variables in the 
study. The coefficients and their significance are similar to models (1) through (5). 

Another interesting aspect of all of the models is that there is a clear brand preference 
for Yahoo over the others. Models (1) through (3) have negative coefficients for all brand 
dummies (Yahoo is the base). Models (4) through (6) also have negative dummies for Yahoo 
but others are often preferred on the first try. Adding the coefficients together, however, 
leaves a negative number meaning that Yahoo is prefered even on the first try. 

The Akaike information criterion revealed that last view length squared, last number of 
pages squared, and media mentions * loyalty should be included. Other variables such 
as advertising squared and advertising * loyalty did not satisfy the Akaike information 
criterion. Note that including start page increases the likelihood a great deal, even though 
the effect is statistically insignificant. Any variables included in this study that satisfy the 
Akaike information criterion also satisfy the Bayesian information criterion. 
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B. Robustness of coefficients to small changes 

Three different models are estimated in table 6. The first is the same as model (2) except 
that it uses a broader definition of failed search. If no search is conducted after visiting 
a portal, then that is included in the failed search variable. Under the new definition of 
failed search, as under the old definition, the coefficient is significantly negative at the 99% 
confidence level; however, the magnitude of the coefficient itself is smaller. Furthermore, in 
this regression, last view length is not significantly different from zero. 

The second and third models in table 6 mimic model (2) but change the loyalty variables. 
The second (model (8)) uses dummy variables for whether the portal is the same as that 
used the previous period and that used two periods before by that household. The third 
(model (9)) uses only the one period lag. The coefficients are still significantly positive in 
all cases. Note, however, that the explanatory power of these two methods is considerably 
less than that of Guadagni and Little's loyalty variable. In both of these models, the 
last view length variable is not significantly different from zero. In model (9), the coefficient 
becomes positive. If last number of pages is included then last view length does become 
significantly negative. 

Table 7 shows the results of conducting the above analysis with any seven of the eight 
portals. Note that, with a few exceptions, the coefficients change little. When either of 
the two largest advertisers are dropped (AOL or Yahoo), advertising becomes insignificantly 
negative. When Iwon, the site with the highest view length, is dropped, past view length 
becomes insignificantly positive. This is not because people are playing games at Iwon, 
since games are considered a destination and not part of the portal page. 

Although there is httle change in the coefficients, the statistics at the bottom of table 
7 show that the independence of irrelevant alternatives (HA) assumption does not hold in 
this model. IIA implies that there is no correlation between the alternatives outside of 
the effects of known features. It is likely that Netscape and MSN are highly negatively 
correlated since they are based on different browsers. This method wrongly assumes that 
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they are uncorrelated, bringing potential bias to some of the coefficients and weakening the 
assertions that can be made from pohcy analysis. It is essential that future work control 
for IIA. 

These statistics were calculated using a Hausman test following Hausman and McFad- 
den (1984). The coefficients on the brand dummies were neither included in the Hausman 
test nor presented in table 7 although they were estimated for each model. While the 
coefficients themselves change little when a portal is dropped out of the estimation, the 
large sample size and corresponding low variances of the coefficients lead to a rejection of 
IIA. This is a considerable, though frequently encountered, problem in this type of analysis. 
Hausman and Wise (1978) and, more recently. Berry (1994) describe how accounting for 
heterogeneity alleviates this problem. This will be the subject of future work. Guadagni 
and Little (1983 p. 221), however, argue that "a more important test of the model will be 
its performance on a holdout sample of customers." This is conducted in the next section. 

These robustness checks suggest that the effects of advertising and last view length 
on probability of choice may not be significantly different from zero. The coefficient on 
start page is also not significantly different from zero. The effect of media mentions is 
robust, but the impact is still not large. The other variables are all very important, partic- 
ularly loyalty. The cause of the importance of the loyalty variable, however, is unknown; it 
could be due to either state dependence or unobserved household heterogeneity or both. 

C. Predictive Ability 

This section explores out of sample predictive power. Figure 1 shows the predicted and 
actual shares of MSN over the fourteen weeks from December 27 1999 to March 31 2000 for 
an outside sample of roughly 600 households. The predictions are done using both model 
(1) and model (6). In this case, both models match the actual shares rather closely. Figures 
2 through 8 show the predicted and actual shares for the other portals. The fits are far 
from perfect. Both models under-predict Yahoo's share, both over-predict AOL, Altavista, 
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Excite, Iwon, and Lycos, and both fit MSN and Netscape fairly well. With the exception of 
Iwon, model (6) fits better than model (1). For each of the brands, however, both models 
matched the general trends in the actual shares. Week-to-week changes in actual shares are 
captured by the predicted models. 

Accounting for differences among households should help improve this predictive ability. 
Preliminary work in accounting for household tastes has shown, for example, that some 
people have a substantial taste preference for Iwon, while others have a substantial dislike. 
This bimodal distribution of tastes is averaged out in the model used here. Thus, actual 
preferences for Iwon are not well represented. The preliminary work suggests that the 
brands with a unimodal and narrow distribution of tastes across households are predicted 
better than are other firms in the model presented in this paper. 

While not perfect, this model has significant predictive power and could be used to 
explore how policies in one market would work in another. 

D. Market Response to Variable Changes 

Tables 8 and 9 explore the market responses to variable changes in model (2) assuming 
no competitive response. Table 8 presents the elasticity of the model to slight changes 
in the variables at the variable means. Table 9 converts these elasticities to changes in 
number of site visits. This table assumes that there are a total of 76.5 million web users, 
Nielsen/Netratings' estimate for the month of February, 2000. While the elasticity numbers 
appear small, the increase in the number of site visits from a marginal increase in a variable 
can be quite large. Taking the results at face value, if MSN users' searches failed just 1% 
less often, MSN would get almost three miUion more site visits. If each site visit is worth 
five cents (about the revenue received from the five advertisements seen over typical two 
page views at a typical search engine), then it would be worth it for MSN to implement this 
change as long as it cost less than one hundred and fifty thousand dollars. 

The advertising results are perhaps the most interesting. An increase in advertising by 
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one dollar would bring six more visits to Altavista but twenty-six more to Yahoo. Therefore, 
Altavista should increase its advertising if each new site visit brings in seventeen cents of 
revenue and Yahoo should increase its advertising if each new site visit brings in just four 
cents of revenue. 

Caution should be used in interpreting these results because of the lack of IIA and 
because the functional form of the error term is important to deriving these results. The 
results, however, do show what future studies using IIA and fewer functional assumptions 
can achieve and they are informative about general trends. For example, while the numbers 
themselves may not be completely accurate, it is likely that an extra dollar of advertising 
by Yahoo has a larger effect than an extra dollar of advertising by Altavista. The current 
exercise should be viewed as an approximation that demonstrates potential marginal gains 
from the variables. 

Another way to simulate policy changes by the firms is to change the underlying data 
and reestimate the market shares given the known coefficients. This method underestimates 
changes because it does not count dynamic effects. It does, however, provide a lower bound 
for the impact. Again using model (2), I undertook this exercise for several variables. If 
MSN advertised as much as AOL, then MSN would gain 13,857,734 more visits assuming 
76.5 million users. If, on the other hand, Iwon advertised as much as AOL then it would 
only gain 2,857,924 visits. If Lycos searches were successful as often as Yahoo searches, 
Lycos traffic would rise by 25,726,505 or four percent. If Altavista had the same links as 
MSN then it would get 98,948,093 more visitors or ten percent. Again, the exact quantities 
of these predictions should be interpreted with caution. The general trends, however, are 
informative. 

5. Conclusion 

This study has provided a preliminary look at estimating demand for advertising-supported 
Internet websites based on clickstream data. The methodology provides a reasonable fit 
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to the actual patterns in the data. It has reasonable predictive power and is informative 
about the potential impact of various policy changes. 

This methodology has several weaknesses. The first, and most important, is that it does 
not take into account individual heterogeneity. This leads to a rejection of the Independence 
of Irrelevant Alternatives hypothesis as well as poor predictive ability for Iwon and Yahoo 
in particular. In future work with the data, I will estimate a model that accounts for this 
heterogeneity. 

Another weakness in this methodology is that it does not allow for the market to grow. It 
predicts changes in share of a given population. It therefore ignores the impact of new users 
in a rapidly growing market and the effect of promotion on market size. The assumption 
that new users will have similar tastes to the current ones has some supporting evidence in 
that fact that market leaders change little over time in advertising-based Internet industries 
(Goldfarb 2000b), but this methodology is much better at exploring the demand of existing 
users rather than that of potential new users. 

With respect policy implications, the study provides a framework for understanding 
pohcy effects. The simulations in section 4.3 show the impact of potential policy changes 
on market shares. While they do not take into account supply side reactions or individual 
heterogeneity, they do give better estimates of policy effects than currently exist. More 
detailed policy analysis can also be explored in this framework. For example, a portal 
could simulate a link to a commonly used site, say americangreetings.com. It could then 
determine the effect of this link on market share. The actual increase in share resulting 
from this change would be no more than the simulated level. It may be less because it 
may be that people who go to a given portal are also the kind of people who like the links 
it has. Thus the effects of the new link may be less than predicted. Because it does not 
account for individual heterogeneity, this model does not provide an effective framework for 
examining the effects of major industry changes such as bankruptcies, nor does it provide a 
way to look at the welfare impact of improved technology. In future work, I will match the 
heterogeneous demand model to a supply side model and estimate the effects of industry 
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changes on demand and welfare. 

The main purpose of this study was to show that demand for free onhne services can be 
estimated using methodologies that are common in both the economics and the marketing 
literature. The coefficients on the variables in the study had the expected signs and the 
predictive ability of the model, though not perfect, captured the major trends. Furthermore, 
informative simulations can be conducted about the effects on share of changing variable 
values. Clickstream data will be an important tool in understanding online demand. This 
study has shown that the standard econometric methods that have previously been applied 
to grocery scanner data can successfully be applied to clickstream data. By bringing more 
econometric sophistication to this analysis, economists and marketers can gain a better 
understanding of online user behavior. 
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Figure 1: MSN 




8 9 10 11 12 13 14 

week 



Figure 2: Altavista 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 



Figure 4: Excite 

10 n 




2 

J , , , , , , , , , , , , , 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 



Figure 6: Lycos 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 



Figure 3: AOL 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 



Figure 5: Iwon 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 



Figure 7: Netscape 











— « A 









1 2 3 4 5 6 7 8 9 10 11 12 13 14 



Figure 8: Yahoo! 



0) 



in 



50 
40 
30 
20 
10 



■actual 
■predicted all 
predicted some 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 
week 



TABLE 1 : Unique visitors at site/unique visitors at Yahoo ! 



Portal 


January 2000 


February 2000 


March 2000 




Fovson 

1 \J V \^ \J 1 1 


Media 

1 V 1 1 u 


PC Data 


Foveon 

1 \J V \^ \J 1 1 


Media 

1 V 1 1 U 


PC Data 


Foveon 


PC Data 






metrix 


Online 




metrix 


Online 




Online 


AltaVista 


0.436 


0.293 


0.413 


0.479 


0.268 


0.386 


0.434 


0.367 


AOL 


0.509 


0.724 


0.996 


0.538 


0.708 


0.950 


0.489 


0.960 


Excite 


0.393 


0.339 


0.399 


0.440 


0.348 


0.435 


0.403 


0.413 


Iwon 


0.110 


0.146 


0.279 


0.194 


0.145 


0.305 


0.125 


0.290 


Lycos 


0.418 


0.414 


0.577 


0.439 


0.607 


0.506 


0.403 


0.508 


MSN 


0.866 


0.788 


0.726 


0.906 


0.810 


0.655 


0.833 


0.693 


Netscape 


0.564 


0.527 


0.479 


0.624 


0.502 


0.446 


0.521 


0.442 


Yahoo 


1 


1 


1 


1 


1 


1 


1 


1 



Foveon data is from the dataset used in this study. Media Metrix data and PC Data Online data are from data posted at cyberatlas.internet.com. All data is 
website-based, not property-based. There was no Media Metrix website-based data for March. There was no website-based data available for 
Nielsen/Netratings. 



TABLE 2: Clickstream Data Sample 



USER HOST START TIME END TIME BYTES FROM BYTES TO # PAGES VIEWED 

AT HOST 



1 


com 


yahoo 


14MAR00 


08 


42 


55 


14MAR00 


08 


45 


28 


196593 


34484 


3 


1 


com 


allrecipes 


14MAR0 


08 


45 


28 


14MAR0 


08 


50 


59 


65825 


656 


12 


1 


com 


ivillage 


14MAR0 


08 


55 


00 


14MAR0 


09 


09 


48 


541337 


72005 


53 


1 


com 


allrecipes 


18MAR0 


12 


27 


10 


18MAR0 


12 


34 


46 


75403 


4454 


5 


1 


com 


allrecipes 


21MAR00 


12 


31 


01 


21MAR00 


12 


36 


51 


75873 


658 


2 


1 


com 


excite 


2 8MAR00 


13 


13 


59 


2 8MAR00 


13 


15 


22 


105884 


4006 


4 


1 


com 


adobe 


2 8MAR0 


13 


15 


06 


2 8MAR0 


13 


19 


39 


70732 


11988 


9 


1 


gov 


nara 


2 8MAR00 


13 


19 


38 


2 8MAR00 


13 


21 


57 


1259 


2340 


1 


1 


gov 


nara 


2 8MAR00 


13 


34 


09 


2 8MAR00 


13 


38 


00 


60155 


9074 


13 


1 


com 


allrecipes 


30MAR00 


16 


44 


18 


30MAR00 


16 


52 


05 


86186 


1857 


4 



TABLE 3: Summary statistics 



"n ^ 1 

Portal 


% Snare or 


frf o 1 £ 

% Snare oi 


Average time 


% time 


% time 


% nousenolds 


% households 


% Link 


% days 




all portal 


visits to top 


spent at site 


fails 


fails or 


with portal as 


with same 




with media 




visits 


8 portals 


(in seconds) 




last visited 


start page 


email 




mentions 


Yahoo 


33.4 


42.0 


96.7 


7.03 


29.36 


9.76 


19.92 


3.20 


58.33 


MSN 


16.6 


20.9 


116.7 


12.10 


34.77 


7.17 


32.97 


4.41 


6.35 


Netscape 


10.7 


13.5 


114.0 


13.33 


43.77 


5.38 


4.38 


3.62 


13.54 


Excite 


5.2 


6.5 


93.2 


11.28 


44.37 


1.29 


2.39 


2.57 


15.63 


AOL 


4.4 


5.5 


93.9 


11.11 


38.79 


0.75 


4.48 


2.78 


82.29 


AltaVista 


4.0 


5.0 


109.7 


14.41 


35.21 


0.30 


0.40 


0.17 


5.21 


Iwon 


2.8 


3.6 


152.0 


14.81 


34.39 


0.30 


1.59 


0.69 


1.04 


Lycos 


2.5 


3.0 


96.2 


31.55 


49.89 


0.20 


4.63 


1.82 


16.67 



TABLE 4: Summary Statistics 



Variable 


Mean 


Std Dev 


Min 


Max 


Advertising ($ 000) 


1772.453 


2389.565 





14962.66 


Media Mentions 


0.339 


0.473 





1 


Start Page 


0.0241 


0.153 





1 


Same Email 


0.113 


0.317 





1 


Link 


0.0187 


0.135 





1 


Last view time 


63.441 


171.592 





31933 


Last number pages 


2.203 


4.536 





473 


Last Search Failed 


0.153 


0.360 





1 


Missing Data 


0.383 


0.486 





1 


GL Loyalty 


0.386 


0.678 





3.929 


Ports ame 


0.099 


0.299 





1 


First Try 


0.639 


0.480 





1 



TABLE 5 - model coefficients (with standard errors in parentheses) 



Variable 


Model (1) 


Model (2) 


Model (3) 


Model (4) 


Model (5) 


Model (6) 


GL Loyalty 


1.352*** 


1.314*** 


1.320*** 


1.205*** 


1.267*** 


1.267*** 




(0.00235) 


(0.00245) 


(0.00247) 


(0.00261) 


(0.00368) 


(0.00368) 


Missing Data 


-2.352*** 


-2 317*** 


-2.282*** 


-2.257*** 


-2.237*** 


-2.209*** 




(0.0126) 


(0.0127) 


(0.0129) 


(0.0129) 


(0.0130) 


(0.013165) 


Last view time at that 


-1.90E-05'^ 


-2.20E-05* 


-0.000120*** 


-2.60E-05* 


-2.60E-05* 


-0.000110*** 


site 


(1.34E-05) 


(1.35E-05) 


(1.59E-05) 


(1.41E-05) 


(1.41E-05) 


(1.67E-05) 


Last view time squared 


2.08E-09** 


2.31E-09** 


6.69E-09*** 


2.43E-09** 


2.49E-09** 


5.95E-09*** 




(9.89E-10) 


(9.87E-10) 


(9.90E-10) 


(9.87E-10) 


(9.93E-10) 


(9.97E-10) 


Last search failed 


-0.476*** 


-0.440*** 


-0.425*** 


-0.451*** 


-0.452*** 


-0.451*** 




(0.00608) 


(0.00618) 


(0.00620) 


(0.00645) 


(0.00646) 


(0.00646) 


Advertising 


5.89E-06* 


6.08E-06** 


6.17E-06** 


4.59E-06'^ 


5.30E-06* 


5.53E-06* 


($ 000) 


(3.01E-06) 


(3.07E-06) 


(3.09E-06) 


(3.17E-06) 


(3.16E-06) 


(3.16E-06) 


Media Mentions 


0.0137** 


0.0136** 


0.0124* 


0.0 109^^ 


0.129*** 


0.128*** 




(0.00667) 


(0.00680) 


(0.00683) 


(0.00712) 


(0.00857) 


(0.00857) 


Media 










-0.144*** 


-0.143*** 


Mentions*loyalty 










(0.00590) 


(0.00590) 


Same email 




0.166*** 


0.174*** 


0.174*** 


0.181*** 


0.181*** 






(0.00511) 


(0.00513) 


(0.00544) 


(0.00544) 


(0.00544) 


Link 




1.982*** 


2.015*** 


2.053*** 


2.056*** 


2.054*** 






(0.0109) 


(0.0110) 


(0.0113) 


(0.0113) 


(0.0113) 


Last number pages 






0.0103*** 






0.00875*** 


viewed at that site 






(0.000710) 






(0.000726) 


Last number of pages 






-6.70E-05*** 






-5.10E-05*** 


squared 






(9.19E-06) 






(8.38E-06) 


Start page 








34.123 
(74642.1) 


41.112 

(2474687) 


36.110 

(203151.9) 


AltaVista 


-0.530*** 


-0.494*** 


-0.287*** 


-0.248*** 


-0.246*** 


-0.258*** 




(0.0103) 


(0.0105) 


(0.0141) 


(0.0142) 


(0.0142) 


(0.0142) 


AOL 


-0.571*** 


-0.700*** 


-0.764*** 


-0.726*** 


-0.769*** 


-0.779*** 




(0.0169) 


(0.0173) 


(0.0202) 


(0.0205) 


(0.0205) 


(0.0205) 


Excite 


-0.479*** 


-0.612*** 


-0.548*** 


-0.540*** 


-0.543*** 


-0.553*** 




(0.00971) 


(0.0101) 


(0.0145) 


(0.0147) 


(0.0147) 


(0.0147) 


Iwon 


-0.415*** 


-0.430*** 


-0.662*** 


-0.633*** 


-0.639*** 


-0.662*** 




(0.0135) 


(0.0138) 


(0.0204) 


(0.0205) 


(0.0206) 


(0.0207) 


Lycos 


-0.686*** 


-0.808*** 


-0.489*** 


-0.494*** 


-0.499*** 


-0.496*** 




(0.0105) 


(0.0108) 


(0.0147) 


(0.0149) 


(0.0148) 


(0.0148) 


MSN 


-0.0270*** 


-0.174*** 


-0.592*** 


-0.654*** 


-0.674*** 


-0.670*** 




(0.00953) 


(0.00971) 


(0.0128) 


(0.0133) 


(0.0133) 


(0.0133) 


Netscape 


-0.157*** 


-0.261*** 


-0.695*** 


-0.779*** 


-0.791*** 


-0.798*** 




(0.0101) 


(0.0104) 


(0.0144) 


(0.0150) 


(0.0150) 


(0.0151) 


First Try (Altavista) 






-0.393*** 


-0.345*** 


-0.353*** 


-0.353*** 








(0.0169) 


(0.0170) 


(0.0171) 


(0.0171) 


First Try (AOL) 






0.0924*** 


0.135*** 


0.137*** 


0.139*** 








(0.0167) 


(0.0169) 


(0.0168) 


(0.0168) 


First Try (Excite) 






-0.126*** 


-0.153*** 


-0.165*** 


-0.168*** 








(0.0171) 


(0.0176) 


(0.0176) 


(0.0177) 


First Try (Iwon) 






0.321*** 


0.361*** 


0.357*** 


0.361*** 








(0.0219) 


(0.0221) 


(0.0223) 


(0.0223) 


First Try (Lycos) 






-0.580*** 


-0.468*** 


-0.474*** 


-0.475*** 








(0.0195) 


(0.0197) 


(0.0196) 


(0.0196) 


First Try (MSN) 






0.631*** 


0.632*** 


0.632*** 


0.633*** 








(0.0123) 


(0.0129) 


(0.0129) 


(0.0129) 


First Try (Netscape) 






0.646*** 


0.668*** 


0.665*** 


0.667*** 








(0.0144) 


(0.0154) 


(0.0154) 


(0.0154) 


Log likelihood 


-442,856 


-425,651 


-421,531 


-386,956 


-386,659 


-386,581 



*** significant at a 1% level in a two-tailed test 
** significant at a 5% level in a two-tailed test 
* significant at a 10% level in a two-tailed test 
significant at a 10% level in a one-tailed test 



TABLE 6 - Robustness to alternative variable definitions 



Variable 


Model (7) 


Model (8) 


Model (9) 




(0.00244) 










(0.00418) 


1 8Qfi1^*** 
1 ,oyy) 1 J 

(0.00369) 






(0.00418) 






(0.0128) 


(0.0125) 


894*** 

(0.0124) 


Last view time at 

Ulal MLC 


-1.30E-05 


-1.20E-05 
(^ 7QF nsi 


7.69E-06 

CO 00001 901 


Last view time 


1.97E-09** 


2.71E-09*** 
CQ Qf>F-101 


1.95E-09** 
C8 8SF-101 


Last search failed 




-0.292*** 
CO 005Q71 


-0.276*** 
CO 00S791 


Last search failed 

visit of session 


-0.240*** 






f"fi 0001 


f3 06F-061 


f\ OOF-Of>** 
04F-061 


S 4^F-0f>* 
(2 92F-061 


Media Mentions 


0.0139** 
(0 006791 


0.0154** 
CO 006541 


0.0163*** 
CO 006191 


Same email 


0.173*** 
CO 005 1 01 


0.417*** 
CO 004821 


0.575*** 
CO 004501 


Link 


1.988*** 
fO 010Q1 


1 927*** 
CO 01081 


1.964*** 
CO 01031 












(0.0105) 


\j. 1 wo 

(0.0102) 


(0.00971) 


AOT 

/Aw J— < 


u.uou 

(0.0173) 


yj.y J u 

(0.0169) 


(0.0162) 


Excite 


(0.0100) 


-yj. / O i 

(0.00964) 


(0.00909) 


Iwon 


CO 01 381 


\J.J\J\J 

CO 01351 


64^*** 
CO 01281 


Lycos 


-0.841*** 
(0.0108) 


-1.109*** 
(0.0106) 


-1 310*** 
(0.0103) 


MSN 


-0.186*** 
(0.00970) 


-0.360*** 
(0.00946) 


-0.486*** 
(0.00900) 


Netscape 


-0.260*** 
(0.0104) 


-0.347*** 
(0.0100) 


-0.420*** 
(0.00955) 










Log Ukelihood 


-427,055 


-456,840 


-504,324 



*** significant at a 1% level in a two-tailed test 
** significant at a 5% level in a two- tailed test 
* significant at a 10% level in a two-tailed test 



TABLE 7 Independence of Irrelevant Alternatives 



Variable 


■Cull IV^j-vj-lal 

run Mouei 


No Altavista 


INO AUL 


No Excite 


No Iwon 


No Lycos 


INO JVloIN 


No Netscape 


No Yahoo 


LjL Loyalty 


1 'J 1 A 






1 om^AA 
1 .ZyZ 


i.jUd 


l.Zoo 


1 /I 1 '3 * A * 


1 '3 1 AA 


l.JJO 




(,U.UU/4j ) 


(U.UUZoi ) 


/A AAOA'2^ 


(U.UUZoZJ 


/A AAO^^^ 


/A AAO ^ 1 ^ 

(U.UUZj 1 ) 


/A AA'2'2 1 ^ 


/A AAOC'2^ 


/A AA/1AA\ 


Missing Data 


-z.il 


o Q 1 ^ 


-Z.J 


-z.jZo^^^ 


-Z.Uoo 


-Z.joZ'^*'^ 


-Z.JjZ^^^ 


-Z.jUj^^^ 


-Z.jZj'^'^'^ 






yj.Kjlj 1 ) 


(^U.Ul JO ^ 


/'M Ml '5 A^ 


(U.uizy ) 


cr\ ^\^ '2A^ 

(U.Ui JO^ 






(U.Ui4o ) 


Last view time 


-Z.ZL-Uj 


-O.l Jii-U/^^ 


-4.jt-Uj 


c 1 rj A^i!:;!:;!: 


o.oot-Uo 


-Z.40L-U0 


A AAA! '2 


n AAC A<* 


A OC A*::!::!;:!; 


at that site 






(^1.4411-1) J J 


(^1. / lii-UD J 


( 1 .44rl,-U J ) 


(U.UUUU14J 




/'I /^91h 


(l.ojJi-UjJ 


Last view time 


O 'J 1 r? r\Q*5i: 

Z. J iL-uy 




o 1 cr: AO*** 

J. 1 jL-uy 




Q A^r? 1 A 
O.OOL-lU 


i.jjL-uy 


Z.Z4n-Uo 


O A7T7 AQ* 

Z.U /li-uy 


z.o iL-uy^^ 


squared 


/'Q CTP 1 (X\ 
(y.o /t^-LV) 




^ i .uuL-uy ) 


HAP MQ^ 


/■I M/1T7 MQ^ 
(i.U4ii-Uy^ 


( 1 .uzL-uy J 






( 1 . 1 / L-uy ^ 


Last search 


-U.44U 


-U.4U / 


r\ A A Q*** 


A /I OO**:!; 

-U.4ZZ 


A /I '2 0*** 

-U.4jo 


A Af^f\^^^ 

-U.4oU 




A /t 

-U.4j4^^^ 


A * ** 


failed 


^U.UUOloJ 




^U.UUOJZJ 




(U.UU04UJ 


(U.UUOJ J ) 




^^u.uuoyj ) 


(U.UU/ol ) 


Advertising 


O.Uot-Uo 


/Coin? 


-O.ljL-UO 




4.4011-UO 


1 .uyL-uj 


A A^Z? A/C 




Q c 1 17 A< 


(■ji UUUJ 


(j.U/L-UOj 




//lilt? r\/;\ 


{j.ZZlL-\JO) 


(J.UOL-UOj 


(J.i jL-UOj 


/"2 'J A r: AA\ 


1 QT7 A/C\ 

(J. ioii-UOj 


/"2 7Qr? r\/^\ 

(J. /oL-Uo; 


Media 


U.Ui JO 


U.Ul 03^ ^ 


A AAQCO 


A AAQ9'2 


A A1 /;Arf: 


A A1 AQ 


A A 1 'J /I * 
U.Ul J4^ 


A Al /C7** 


A A1 Q1 * 

U.Uioi 


Mentions 


(^U.UUuoUj 


(U.UU i Vy) 


(U.UU/U4^ 


(^U.UU / J / j 


(U.UUO /u^ 


(U.UU/Z/ J 


(U.UUo io ) 






Same email 


U. iOO 


U. i 1 z^^^ 


A 1 T/1 

U. i /4-^-^-^ 




A 1 QA 


A 1 nr\-M-M-M 


A O 1 '2*** 


A 1 ^Q*** 


A AQC/1 •Jf.-M-M 




1 ij 




("A fin^'?A~i 


/A AA'^^^^ 




t^U.UUJZO^ 


/A AA7^Q^ 


yyj.yjyjjKij ) 




T inV 

ijinK 




L.KjLL 


1 Q1Q*** 

1 .o ly 


1 oni *** 


1 Q77*** 


o n7n*** 
z.u /u 


9 1 "2^*** 
Z. 1 jO 


9 lA'?*** 


1 QA7*** 








(0 01251 






m 01 igi 




fO 0126^ 
























Log 


-425,651 


-366,971 


-360,085 


-368,981 


-390,203 


-380,168 


-274924 


-329096 


-230649 


Likelihood 




















N 


519,705 


493,755 


490,957 


485,707 


501,162 


504,239 


411,099 


449,810 


301,206 


Chi squared 


N/A 


756.1 


339.6 


695.9 


4682.9 


2338.9 


3016.8 


216.2 


7147.7 


test of IIA 





















*** significant at a 1% level in a two-tailed test 
** significant at a 5% level in a two-tailed test 
* significant at a 10% level in a two-tailed test 



TABLE 8: Elasticity at means using model (2) results 







T out ■\71C»H7 

J-id-M View 

time 


J-iClM View 


T act f Of/"* ri 

J-zaM aCdlCll 

f tilled 


r\il V CI U Mllg 


iVlCUld. 
iVACIlLlVJIi J 


OdiliC Edlidll 


T inli- 


Alt?) VI Qtfl 




-0 001 326 


4 69F-0S 


-0 06298 


007041 


001247 

\J,\J\J 1 Z/T- / 


ooos? 


00314 


AOL 


0.1688 


-0 00123 


6 69F-05 


-0 05342 




012136 


006893 


05108 


Excite 


0.2097 


-0.001089 


0.000087 


-0.05462 


0.006836 


0.002765 


0.006136 


0.04827 


Iwon 


0.1164 


-0.000906 


6.94E-05 


-0.02909 


1.60E-07 


0.000278 


0.002397 


0.01332 


Lycos 


0.0807 


-0.001195 


5.08E-05 


-0.09324 


0.009032 


0.004264 


0.008246 


0.03435 


MSN 


0.5632 


-0.001499 


3.79E-05 


-0.0717 


0.001382 


0.001278 


0.047193 


0.07109 


Netscape 


0.3989 


-0.001329 


0.000114 


-0.05793 


0.001109 


0.0021 


0.009134 


0.06539 


Yahoo 


0.7683 


-0.000995 


7.58E-05 


-0.02904 


0.007523 


0.005729 


0.030408 


3.41E-04 



TABLE 9: Increase in number of site visits over sample period due to small changes in variable 





Increase advertising 
by one doUar 


One more media 
mention 


Searches take one 
second less on average 


Searches fail 1% 
less often 


Links used 1% more 
often 


AltaVista 


6.542 


23,210 


11,946 


622,239 


31,0238 


AOL 


6.222 


2,291,220 


14,290 


582,787 


557,2597 


Excite 


22.26 


290,137 


15,107 


706,186 


624,0876 


Iwon 


0.02825 


13,815 


4,213 


205,620 


94,1511 


Lycos 


3.383 


33,483 


7,304 


548,294 


201,9935 


MSN 


20.59 


1,113,855 


53,077 


2,962,759 


2,937,5517 


Netscape 


14.87 


5,909,102 


31,045 


1,542,696 


1,741,3581 


Yahoo 


26.53 


3,145,037 


85,688 


2,418,358 


28,3978 



*Assumes 76.5 Million total web users. This is Nielsen/Netratings' estimate of the total number of users in February 2000 



